There is the data: atp_tennis containing the following columns: ['Tournament', 'Date', 'Series', 'Court', 'Surface', 'Round', 'Best of', 'Player_1', 'Player_2', 'Winner', 'Rank_1', 'Rank_2', 'Pts_1', 'Pts_2', 'Odd_1', 'Odd_2', 'score'].  
--- The description for each column this data is:
Tournament: Name of the tennis tournament (Brisbane International, Chennai Open, Qatar Exxon Mobil Open ...etc)
Date: Date the match was played (year-month-day)
Series: Category or level of the tennis tournament (ATP250, ATP500, Masters1000 and Grand Slams offer 250, 500, 1000, and 2000 ranking points to the winner seperately.)
Court: Place the match was held (Indoors or Outdoors)
Surface: Type of court surface (Hard, Grass and Clay)
Round: Stage of the tournament (1st Round, 2nd Round, Quarterfinals, Semifinal and The Final)
Best of: Tourament systems ("best of 3" or "best of 5")
Player_1: Name of the first competitor in each match
Player_2: Name of the second competitor in each match
Winner: Name of the Player_1 or Player_2 who won the match 
Rank_1: World rankings of the Player_1 at the time of the match
Rank_2: World rankings of the Player_2 at the time of the match
Pts_1: Ranking points held by Player_1 before the match (accumulated over the season and higher points typically indicate a more successful player)
Pts_2: Ranking points held by Player_2 before the match
Odd_1: Betting odds for Player_1 (reflecting their perceived chances of winning)
Odd_2: Betting odds for Player_2 
Score: Final score of the match ("score of Player_1" - "score of Player_2")
---
Considering contents from the dataset and requirements from user. Please note DO NOT CHANGE FILE AND VARIABLE NAMES THAT I HAVE SET!
Interactions begin:
[USER (data scientist)]: Can you help me find the matches in the atp_tennis dataset where Roger Federer is either Player_1 or Player_2? I would like to have a list of Match IDs for these matches.

[YOU (AI assistant)]: Sure, I can help you with that. To find the matches where Roger Federer is either Player_1 or Player_2, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle
from decision_company import read_csv_file, assert_series, logical_or, to_list, fetch_index

atp_tennis = read_csv_file('atp_tennis.csv')

# Filter the DataFrame
federer_matches = atp_tennis[logical_or(assert_series(atp_tennis['Player_1'], 'Federer R.', 'equality'), assert_series(atp_tennis['Player_2'], 'Federer R.', 'equality'))]

# Get the index using the custom function
federer_match_ids_index = fetch_index(federer_matches)

# Convert the index to a list
federer_match_ids = to_list(federer_match_ids_index)

print(federer_match_ids)

'''

[USER (data scientist)]: Great! We have a new competitor named 'Raonic M.'. I need to find the win rates, head-to-head records against other players in the affected matches, and performance on the same surface as the tournament for this new competitor. Please provide a dataframe with Player Names, Win Rates, Head-to-Head Records, and Surface Performance.

[YOU (AI assistant)]: To find the win rates, head-to-head records, and surface performance for 'Raonic M.', you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle
from decision_company import read_csv_file, logical_and, logical_or, create_dataframe, concatenate_objects, search_where, extract_unique_values, update_dict, assert_series

atp_tennis = read_csv_file('atp_tennis.csv')

# Find Federer matches
federer_matches = atp_tennis[logical_or(assert_series(atp_tennis['Player_1'], 'Federer R.', 'equality'), assert_series(atp_tennis['Player_2'], 'Federer R.', 'equality'))]

# Get Federer opponents
federer_opponents = extract_unique_values(search_where(federer_matches['Player_1'], assert_series(federer_matches['Player_1'], 'Federer R.', 'inequality'), federer_matches['Player_2']))

# Retrieve unique surface types
surface_types = extract_unique_values(atp_tennis['Surface'])

# Function to calculate a tennis player's overall win rate, head-to-head record, and performance on different surfaces against a specific opponent.
def calculate_performance(player, opponent, data, surface_types):
    player_matches = data[logical_or(logical_and(assert_series(data['Player_1'], player, 'equality'), assert_series(data['Player_2'], opponent, 'equality')), logical_and(assert_series(data['Player_1'], opponent, 'equality'), assert_series(data['Player_2'], player, 'equality')))]
    total_matches = len(player_matches)
    wins = len(player_matches[assert_series(player_matches['Winner'], player, 'equality')])
    win_rate = wins / total_matches if total_matches > 0 else 0

    h2h_record = wins / total_matches if total_matches > 0 else 0

    surface_performance = {}
    for surface in surface_types:
        player_surface_matches = player_matches[assert_series(player_matches['Surface'], surface, 'equality')]
        player_surface_wins = len(player_surface_matches[assert_series(player_surface_matches['Winner'], player, 'equality')])
        surface_performance[surface] = player_surface_wins / player_surface_matches.shape[0] if player_surface_matches.shape[0] > 0 else 0

    return win_rate, h2h_record, surface_performance

# Initialize an empty dataframe for storing the results
columns = ['Player', 'Opponent', 'Win Rate', 'Head-to-Head'] + [f'{surface} Surface Performance' for surface in surface_types]
player_stats = create_dataframe(data=[], columns=columns)

# Loop through Federer's opponents
for opponent in federer_opponents:
    # Calculate win rate, head-to-head record, and surface performance for Raonic M. against each opponent
    win_rate, h2h_record, surface_performance = calculate_performance('Raonic M.', opponent, atp_tennis, surface_types)

    # Append the results to the dataframe
    row_data = {'Player': 'Raonic M.', 'Opponent': opponent, 'Win Rate': win_rate, 'Head-to-Head': h2h_record}
    update_dict(row_data, {f'{surface} Surface Performance': surface_performance[surface] for surface in surface_types})
    player_stats = concatenate_objects(player_stats, create_dataframe(row_data, index=[0]))

print(player_stats)

'''

[USER (data scientist)]: Thank you. Now, I need to update the odds for each affected match, considering the new competitor's performance (win rate). Please provide a dataframe with Match IDs, Original Odds, and Updated Odds, where the odds are adjusted based on the new competitor's win rate and a performance weight factor.

[YOU (AI assistant)]: To update the odds for each affected match, considering the new competitor's performance, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle
from decision_company import read_csv_file, concatenate_objects, create_dataframe, bind_dataframe, reset_index, rename_columns, access_dataframe_loc, join_dataframes, get_max, assert_series, extract_unique_values, is_a_null_df, visit_by_index, fetch_df_size, create_multiindex_from_product, convert_multiindex_to_dataframe, remove_labels, fill_missing_values

atp_tennis = read_csv_file('atp_tennis.csv')

# Function to calculate head-to-head win rates between players
def calculate_head_to_head_win_rate(data):
    unique_players = extract_unique_values(concatenate_objects(data['Player_1'], data['Player_2']))

    total_matches = reset_index(fetch_df_size(bind_dataframe(data, ['Player_1', 'Player_2'])), drop=False)
    total_matches = rename_columns(total_matches, columns={0: 'count_total'})

    player_wins = fetch_df_size(bind_dataframe(data[assert_series(data['Winner'], data['Player_1'], 'equality')], ['Player_1', 'Player_2']))
    player_wins = reset_index(player_wins, drop=False)
    player_wins = rename_columns(player_wins, {0: 'count_wins'})

    multiindex = create_multiindex_from_product([unique_players, unique_players], names=['Player', 'Opponent'])
    all_pairs = convert_multiindex_to_dataframe(multiindex, index=False)
    all_pairs = all_pairs[assert_series(all_pairs['Player'], all_pairs['Opponent'], 'inequality')]

    h2h_win_rates = remove_labels(join_dataframes(all_pairs, total_matches, left_on=['Player', 'Opponent'], right_on=['Player_1', 'Player_2'], how='left'), columns=['Player_1', 'Player_2'])
    h2h_win_rates = remove_labels(join_dataframes(h2h_win_rates, player_wins, left_on=['Player', 'Opponent'], right_on=['Player_1', 'Player_2'], how='left'), columns=['Player_1', 'Player_2'])

    fill_missing_values(h2h_win_rates['count_wins'], value=0, inplace=True)
    fill_missing_values(h2h_win_rates['count_total'], value=1, inplace=True)

    h2h_win_rates['Head-to-Head Win Rate'] = h2h_win_rates['count_wins'] / h2h_win_rates['count_total']
    h2h_win_rates = remove_labels(h2h_win_rates, columns=['count_total', 'count_wins'])

    return h2h_win_rates

# Function to estimate win probability based on individual and head-to-head win rates
def estimate_probability(player_1_win_rate, player_2_win_rate, h2h_player_1_win_rate, h2h_player_2_win_rate, performance_weight=0.7, h2h_weight=0.3):
    if player_1_win_rate + player_2_win_rate == 0:
        player_1_win_rate = player_2_win_rate = 0.5

    if h2h_player_1_win_rate + h2h_player_2_win_rate == 0:
        h2h_player_1_win_rate = h2h_player_2_win_rate = 0.5

    weighted_player_1_win_rate = performance_weight * player_1_win_rate + h2h_weight * h2h_player_1_win_rate
    weighted_player_2_win_rate = performance_weight * player_2_win_rate + h2h_weight * h2h_player_2_win_rate
    total_weighted_win_rate = weighted_player_1_win_rate + weighted_player_2_win_rate

    player_1_probability = weighted_player_1_win_rate / total_weighted_win_rate
    player_2_probability = weighted_player_2_win_rate / total_weighted_win_rate

    return player_1_probability, player_2_probability

# Function to update odds for a match based on the calculated probabilities
def update_odds(match_id, performance_weight, h2h_weight, data, h2h_win_rates):
    player_1 = access_dataframe_loc(data, match_id, 'Player_1')
    player_2 = access_dataframe_loc(data, match_id, 'Player_2')

    player_1_win_rate = access_dataframe_loc(player_stats, assert_series(player_stats['Opponent'], player_1, 'equality'), 'Win Rate')
    player_2_win_rate = access_dataframe_loc(player_stats, assert_series(player_stats['Opponent'], player_2, 'equality'), 'Win Rate')

    player_1_win_rate = visit_by_index(player_1_win_rate, 0) if not is_a_null_df(player_1_win_rate) else 0
    player_2_win_rate = visit_by_index(player_2_win_rate, 0) if not is_a_null_df(player_2_win_rate) else 0

    h2h_player_1_win_rate = access_dataframe_loc(h2h_win_rates, assert_series(h2h_win_rates['Opponent'], player_2, 'equality'), 'Head-to-Head Win Rate')
    h2h_player_2_win_rate = access_dataframe_loc(h2h_win_rates, assert_series(h2h_win_rates['Opponent'], player_1, 'equality'), 'Head-to-Head Win Rate')

    h2h_player_1_win_rate = visit_by_index(h2h_player_1_win_rate, 0) if not is_a_null_df(h2h_player_1_win_rate) else 0
    h2h_player_2_win_rate = visit_by_index(h2h_player_2_win_rate, 0) if not is_a_null_df(h2h_player_2_win_rate) else 0

    player_1_probability, player_2_probability = estimate_probability(player_1_win_rate, player_2_win_rate, h2h_player_1_win_rate, h2h_player_2_win_rate, performance_weight, h2h_weight)
    max_odd_1 = get_max(data['Odd_1'])
    max_odd_2 = get_max(data['Odd_2'])

    if player_1_probability == 0:
        updated_odd_1 = max_odd_1
    else:
        updated_odd_1 = 1 / player_1_probability

    if player_2_probability == 0:
        updated_odd_2 = max_odd_2
    else:
        updated_odd_2 = 1 / player_2_probability

    return updated_odd_1, updated_odd_2

# Calculate head-to-head win rates
h2h_win_rates = calculate_head_to_head_win_rate(atp_tennis)

# Initialize an empty DataFrame for storing the results
updated_odds_df = create_dataframe(data=[], columns=['Match ID', 'Original Odd_1', 'Original Odd_2', 'Updated Odd_1', 'Updated Odd_2'])

# Loop through the affected matches
for match_id in federer_match_ids:
    # Update the odds for the match
    updated_odd_1, updated_odd_2 = update_odds(match_id, 0.7, 0.3, atp_tennis, h2h_win_rates)

    # Append the results to the DataFrame
    original_odd_1 = access_dataframe_loc(atp_tennis, match_id, 'Odd_1')
    original_odd_2 = access_dataframe_loc(atp_tennis, match_id, 'Odd_2')
    updated_odds_df = concatenate_objects(updated_odds_df, create_dataframe(data={'Match ID': [match_id], 'Original Odd_1': [original_odd_1], 'Original Odd_2': [original_odd_2], 'Updated Odd_1': [updated_odd_1], 'Updated Odd_2': [updated_odd_2]}), reset_index_flag=True)

print(updated_odds_df)

'''

[USER (data scientist)]: That's great. I need to compare the updated odds for the affected matches to the original odds. Please provide a dataframe with Match IDs, Original Odds, Updated Odds, and Comparison Metrics (Difference and Percentage Change).

[YOU (AI assistant)]: To compare the updated odds for the affected matches to the original odds and provide a dataframe with Match IDs, Original Odds, Updated Odds, and Comparison Metrics, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle
from decision_company import read_csv_file

atp_tennis = read_csv_file('atp_tennis.csv')

# Fetching specific columns ('Original Odd_1', 'Original Odd_2', 'Updated Odd_1', 'Updated Odd_2') from the dataframe.
original_odd_1 = updated_odds_df['Original Odd_1']
original_odd_2 = updated_odds_df['Original Odd_2']
updated_odd_1 = updated_odds_df['Updated Odd_1']
updated_odd_2 = updated_odds_df['Updated Odd_2']

# Calculating the differences between updated and original odds for both odds 1 and 2.
difference_odd_1 = updated_odd_1 - original_odd_1
difference_odd_2 = updated_odd_2 - original_odd_2

# Calculating the percentage change for both sets of odds.
percentage_change_odd_1 = (difference_odd_1 / original_odd_1) * 100
percentage_change_odd_2 = (difference_odd_2 / original_odd_2) * 100

# Adding the calculated differences and percentage changes as new columns to the dataframe.
updated_odds_df['Difference Odd_1'] = difference_odd_1
updated_odds_df['Difference Odd_2'] = difference_odd_2
updated_odds_df['Percentage Change Odd_1'] = percentage_change_odd_1
updated_odds_df['Percentage Change Odd_2'] = percentage_change_odd_2

print(updated_odds_df)

'''

[USER (data scientist)]: Thank you. Now, I need to know the potential impact of the updated odds on the later rounds of the tournament, specifically the quarterfinals, semifinals, and finals. Please provide a dataframe with Round, Original Odds, Updated Odds, and Potential Impact Metrics (Changes in Match Favorites, Shifts in Tournament Dynamics). The computation for match favorites is based on the assumption that the player with the lower odds is more likely to win the match. And don't consider the impact over time.

[YOU (AI assistant)]: To analyze the potential impact of the updated odds on the later rounds of the tournament (quarterfinals, semifinals, and finals), and provide a dataframe with Round, Original Odds, Updated Odds, and Potential Impact Metrics, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle
from decision_company import read_csv_file, check_elements_in_list, join_dataframes, sum_up, use_function

atp_tennis = read_csv_file('atp_tennis.csv')

# Filter the affected matches for quarterfinals, semifinals, and finals
later_rounds = atp_tennis[check_elements_in_list(atp_tennis['Round'], ['Quarterfinals', 'Semifinals', 'Finals'])]

# Merge the original odds and updated odds dataframes on Match ID
later_rounds_odds = join_dataframes(later_rounds, updated_odds_df[['Match ID', 'Updated Odd_1', 'Updated Odd_2']], left_index=True, right_on='Match ID')

# Calculate the original and updated match favorites
later_rounds_odds['Original Favorite'] = use_function(later_rounds_odds, lambda row: row['Player_1'] if row['Odd_1'] < row['Odd_2'] else row['Player_2'], axis=1)
later_rounds_odds['Updated Favorite'] = use_function(later_rounds_odds, lambda row: row['Player_1'] if row['Updated Odd_1'] < row['Updated Odd_2'] else row['Player_2'], axis=1)

# Calculate the changes in match favorites and shifts in tournament dynamics
later_rounds_odds['Change in Favorite'] = later_rounds_odds['Original Favorite'] != later_rounds_odds['Updated Favorite']
later_rounds_odds['Shift in Tournament Dynamics'] = sum_up(later_rounds_odds['Change in Favorite'])

# Create a dataframe with the required columns
impact_df = later_rounds_odds[['Match ID', 'Round', 'Odd_1', 'Odd_2', 'Updated Odd_1', 'Updated Odd_2', 'Original Favorite', 'Updated Favorite', 'Change in Favorite', 'Shift in Tournament Dynamics']]

print(impact_df)

'''

[USER (data scientist)]: Thank you. Finally, I need to know how the overall competitiveness of the tournament changes based on the updated odds compared to the original odds. Please provide summary statistics and visualizations (Density plot for Distribution of Odds, Bar plot for Changes in Match Favorites, Line plot for Shifts in Tournament Dynamics).

My template of code snippet is:

---BEGIN CODE TEMPLATE---

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from decision_company import read_csv_file,

# please import the necessary private functions from decision_company first

atp_tennis = read_csv_file('atp_tennis.csv')

create_figure((12, 6))

# YOUR SOLUTION BEGIN:
<code1>
[COMPLETE YOUR CODE]
</code1>
# YOUR SOLUTION END

# save data
save_plot('./pred_result/updated_odds_change_for_player_1.png')
show_plots()

create_figure((12, 6))

# YOUR SOLUTION BEGIN:
<code2>
[COMPLETE YOUR CODE]
</code2>
# YOUR SOLUTION END

# save data
save_plot('./pred_result/updated_odds_change_for_player_2.png')
show_plots()

create_figure((12, 6))

# YOUR SOLUTION BEGIN:
<code3>
[COMPLETE YOUR CODE]
</code3>
# YOUR SOLUTION END

# save data
save_plot('./pred_result/changes_in_match_favorites.png')
show_plots()

create_figure((12, 6))

# YOUR SOLUTION BEGIN:
<code4>
[COMPLETE YOUR CODE]
</code4>
# YOUR SOLUTION END

# save data
save_plot('./pred_result/shifts_in_tournament_dynamics.png')
show_plots()

---END CODE TEMPLATE---

Please note that you have to generate the WHOLE python code instead of code segments based on the code snippet using Pandas library 2.0.3 version and Matplotlib library 3.7.4 version. You must keep all comments in code snippet unchanged.

[YOU (AI assistant)]: To analyze the overall competitiveness of the tournament based on the updated odds compared to the original odds, and provide summary statistics and visualizations, you can use the following code:
'''
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from decision_company import read_csv_file,

# please import the necessary private functions from decision_company first

atp_tennis = read_csv_file('atp_tennis.csv')

create_figure((12, 6))

# YOUR SOLUTION BEGIN:
